[00:05.330 --> 00:12.050]  I'm watching myself on the stream. There we go. So, hi everybody. I'm Hank Leininger, a co-worker
[00:12.050 --> 00:17.890]  and fellow organizer of some of the events like the Bachelor Village and the Crack Me If You Can
[00:17.890 --> 00:26.250]  contest. So, I'm going to give a talk on a defensive technique and technology and open
[00:26.250 --> 00:30.950]  source tool that we ended up developing after sitting around thinking about all these interesting
[00:30.950 --> 00:37.310]  all these ways that we can make password cracking easier. We asked ourselves, okay, well,
[00:37.310 --> 00:40.270]  can we turn those into ways to make password cracking harder?
[00:41.330 --> 00:46.170]  The deal I didn't watch is we're going to learn from successful attacks
[00:46.170 --> 00:51.370]  and use that to inform us of how to make better defenses.
[00:52.570 --> 00:59.230]  Just a little about me. I came up wearing a sysadmin hat and, you know, got into security
[00:59.230 --> 01:04.370]  that way. Still enjoy building stuff about as much as I enjoy breaking them.
[01:04.910 --> 01:13.190]  And, yeah, I, you know, sometimes do talks and build things. And, PathWell, the topic of this
[01:13.190 --> 01:22.870]  talk. Like I said, it came sort of as just a, you know, brainstormed idea, but my co-workers
[01:23.590 --> 01:28.350]  actually turned it into running code that we released. So, kudos to them.
[01:29.230 --> 01:36.070]  So, the audience probably includes people that can't spell hashcat and people who wrote hashcat.
[01:36.550 --> 01:43.810]  So, I'm going to give a very oversimplified overview of a particular kind of cracking
[01:43.810 --> 01:49.590]  technique for the sake of people for whom this is not second nature. And those of you who
[01:49.590 --> 01:54.190]  know that I'm oversimplifying, don't at me, bro.
[01:55.750 --> 02:02.450]  So, we have a bunch of classic defenses used for password cracking. I got to keep in mind that
[02:02.450 --> 02:11.710]  the slides update way slower than I do. All of these standard old ways that we try to make passwords
[02:11.710 --> 02:18.030]  stronger, yeah, they basically don't work. There's a bunch of talks that go into a lot more detail
[02:18.030 --> 02:29.070]  than I'm going to, but password complexity rules, they helped a little when they were
[02:29.630 --> 02:36.430]  newly enforced, but it turns out humans make very similar decisions in the face of similar
[02:36.430 --> 02:42.590]  policies. And password crackers have figured out the way humans respond to being given certain
[02:42.590 --> 02:49.450]  rules. And we can guess where everybody's going to end up. As soon as we know the password policy
[02:49.450 --> 02:54.430]  of a company, we can guess what a lot of those users passwords are going to look like. And if
[02:54.430 --> 03:00.490]  we know what their passwords were a year ago, and we know their rotation requirements, we can
[03:00.490 --> 03:08.890]  probably guess what they are today. Just to illustrate some examples, I want to throw some
[03:08.890 --> 03:15.170]  numbers up here, which are rough. Again, don't at me. But suppose you built yourself a budget
[03:15.170 --> 03:20.430]  password cracking rig, and congratulations, you can exhaust the entire eight character key space
[03:20.430 --> 03:27.930]  in a day or under a day. But once password links get up to 9, 10, 11 characters, that becomes
[03:27.930 --> 03:36.390]  infeasible. However, nobody actually tries to brute force the entire key space. We'll figure out
[03:36.390 --> 03:42.110]  what's the most relevant or bang for the buck places to spend our compute time.
[03:44.410 --> 03:51.790]  So the math and the proportions still apply, but we're not going after the entire key space. We're
[03:51.790 --> 03:59.670]  going after the most relevant tiny slice of the password key space. So one way to
[03:59.670 --> 04:07.850]  divide up that key space is what Hashcat calls mask attacks, where you say, in this character
[04:07.850 --> 04:12.070]  position, it's going to be an uppercase letter, and I want you to try every uppercase letter,
[04:12.070 --> 04:16.630]  and in this position, I want you to try every number, and so on. And usually you don't do this
[04:16.630 --> 04:21.330]  for the entire password. You do it for, I want to take a dictionary word and then tack this pattern
[04:21.330 --> 04:27.950]  onto the end of it, or I want to take a dictionary word and apply rules that morph it in these ways
[04:27.950 --> 04:37.830]  in these positions. So again, just for the sake of figuring out how to discuss this and do the math,
[04:38.510 --> 04:45.190]  we'll use this way of looking at it in this notation. So suppose you wanted to try every
[04:45.190 --> 04:54.370]  possible password of a specific mask. And so the math that I've done here is,
[04:54.950 --> 05:02.170]  if we have four different character classes, uppercase letters, lowercase letters, numbers,
[05:02.170 --> 05:08.810]  and special characters, each position of the password could be one of those four. So essentially,
[05:08.810 --> 05:18.870]  the password key space might be 95 to the 8th, but the topology key space is 4 to the 8th.
[05:19.470 --> 05:27.530]  And if we have an 11 character, there's 4 to the 11th different topologies.
[05:27.530 --> 05:33.350]  If we pick the single topology of uppercase number, lowercase, lowercase, lowercase,
[05:33.350 --> 05:40.090]  whatever, number, number, special, we multiply out the possibilities in each of those positions.
[05:40.090 --> 05:45.290]  And we say to exhaust that entirely, which again, we would actually combine that with
[05:46.090 --> 05:50.950]  word lists or other things, so that we didn't have to blindly guess, we would guess the most
[05:50.950 --> 05:56.690]  frequent or the most likely ones. In any case, if you did want to exhaust the whole thing,
[05:56.690 --> 06:06.050]  there would be 265 trillion possible combinations. And that hypothetical cheapo cracking box that
[06:06.730 --> 06:10.430]  would be totally infeasible to try to reinforce the entire key space
[06:10.430 --> 06:13.650]  could do that one topology in about half an hour.
[06:15.870 --> 06:21.220]  So the question we have to ask for the sort of evaluating the effectiveness of
[06:23.510 --> 06:30.770]  hash mask based attacks is, do users bias towards specific
[06:31.530 --> 06:38.930]  math pattern or specific topologies? If we could guess which patterns were overused by users,
[06:39.360 --> 06:44.830]  then we would focus our attacks just on those. And we wouldn't have to, we wouldn't have to worry
[06:44.830 --> 06:50.510]  about 99.99% of the key space. We would only have to worry about the parts of the key space where
[06:50.510 --> 06:56.670]  the users are all clustered. And again, we would actually combine this with other smarts.
[06:56.670 --> 07:00.070]  So in actual fact, our work would be even less.
[07:01.830 --> 07:06.110]  So at our, at our company, at our day job, we do a lot of password audits,
[07:06.110 --> 07:10.470]  both regular routine recurring ones and one-offs when we do a pen test.
[07:10.470 --> 07:13.030]  So we have a whole bunch of data to look at.
[07:14.710 --> 07:20.570]  Spoiler alert. Yes, of course users cluster into common topologies. That's why mask attacks are
[07:20.570 --> 07:26.270]  worth doing. But we wanted to get some science and some numbers behind that. So we analyzed
[07:27.510 --> 07:33.430]  the results of having cracked high percentages of a number of different organizations.
[07:35.450 --> 07:41.210]  The first case study I'm going to throw out numbers of, we cracked, you know, around over
[07:41.210 --> 07:49.870]  99% of current and historical hashes. And then we broke the, we attacked them not with a mask
[07:49.870 --> 07:54.150]  centric focus. We attacked them with all our regular techniques, but then we did the crunching
[07:54.150 --> 07:58.250]  in the math and we said, okay, how many different topologies were in here and what were the most
[07:58.250 --> 08:06.570]  frequent and how frequent were they? Out of those quarter million different logins and hashes,
[08:06.570 --> 08:16.610]  there were only around 7,000 unique topologies in use. And over 10% of the population used
[08:18.170 --> 08:23.590]  each of the two most popular, actually three of the three most popular topologies.
[08:24.990 --> 08:31.670]  So if we knew nothing about this organization whatsoever, other than that uppercase lower,
[08:31.670 --> 08:36.130]  lower, lower, lower number number was going to be highly successful, we would crack over 10%
[08:36.130 --> 08:44.130]  of their population without a second thought. And if we did, if we did all the top five, we would
[08:44.130 --> 08:50.850]  have gotten almost half the company. If we did the top 100, which is still a tiny slice of the
[08:50.850 --> 08:57.970]  overall key space, we would have gotten 85% of all hashes. And by the way, you can also determine,
[08:57.970 --> 09:02.390]  you can discern from the relationship between the number one most popular and the number two most
[09:02.390 --> 09:13.170]  popular and the number three and four, sorry, three and five, that this organization made
[09:13.170 --> 09:18.610]  their password policy stronger sometime in the history of the dump that we have. Good for them,
[09:18.610 --> 09:24.650]  stronger password policy. No, totally worthless because all their users did was made the word
[09:24.650 --> 09:31.310]  inside just one letter longer and added another lowercase to that run. So trivial for us to
[09:31.310 --> 09:41.030]  figure that out and go after. Now this is a graphical depiction of how frequent the most
[09:41.030 --> 09:46.230]  frequent topologies were. Pay no attention to the labels on the x-axis. They don't mean anything
[09:46.230 --> 09:52.510]  other than the unique identifier that we assigned to each particular topology. The point being,
[09:52.510 --> 09:57.350]  the most, the few most popular topologies, the ones that we've crowded on, you know, we've
[09:57.350 --> 10:03.530]  lined up on the left-hand side of the graph, were, you know, 10% or more of the population. And then
[10:03.530 --> 10:10.430]  it tailed off very, very quickly and steeply. So there were some users who were indeed special
[10:10.430 --> 10:15.730]  snowflakes, but they were like a handful of them way off to the right side of the graph.
[10:16.270 --> 10:22.390]  They don't help the organization as a whole stay safe because the people on the left-hand side that
[10:22.390 --> 10:29.590]  are all grouped together, they're easy prey. No other organization, kind of a similar story,
[10:29.590 --> 10:35.170]  slightly larger hash pool, slightly lower crack percentage at the time we did this crunching,
[10:35.170 --> 10:42.470]  but still well over 90%. So we were confident that the data that we were, the conclusions we
[10:42.470 --> 10:46.990]  were going to draw from this data were valid. By the way, to preempt a question that I would ask
[10:46.990 --> 10:53.510]  if I was watching this presentation, all of the subsequent stuff is based on number of cracks
[10:53.510 --> 10:58.450]  compared to total population, not number of cracks compared to total number of cracks.
[10:58.450 --> 11:07.370]  For instance, when I say that 19,000 people using or hashes using that one topology is 4.3%,
[11:07.370 --> 11:17.170]  it's 4.3% of 449,000, not 4.3% of 419,000 that we were successful at. We don't know what the
[11:17.170 --> 11:23.990]  30,000 were that we didn't crack, but we know what percentage of the total the ones we did crack are.
[11:26.560 --> 11:34.340]  So in this organization, the numbers aren't as bad, but it's still a very significant relationship.
[11:34.340 --> 11:41.020]  And it's also, once again, the case that somewhere embodied in the password history
[11:41.020 --> 11:46.260]  that we captured, they changed their policies to make them stronger. But the way people responded
[11:46.260 --> 11:53.220]  to it was, oops, was just to add another special, added another character at the end, which was a
[11:53.220 --> 11:58.740]  special. Because they went from requiring eight characters, three of four, to requiring nine
[11:58.740 --> 12:02.880]  characters, four of four. So users are just like, oh, okay, I'm gonna put an exclamation point at
[12:02.880 --> 12:11.900]  the end. Once again, didn't stop us, you know, from cracking the new ones. This is the graph
[12:11.900 --> 12:18.680]  for that organization. It's not nearly as bad and ugly, but it is still bad and ugly.
[12:22.440 --> 12:29.540]  So we did this same kind of math across eight or so different large data compilations. We
[12:29.540 --> 12:34.840]  discarded anything that was too small to be, you know, we didn't want a single
[12:37.000 --> 12:45.020]  outlier to skew the overall numbers too much. And we also limited it to ones that we had
[12:45.020 --> 12:51.540]  cracked a substantial portion, over 90%. Again, so that the conclusions we were able to draw
[12:51.680 --> 12:59.480]  were sort of statistically, you know, valid for the population, knowing whatever the caveats,
[12:59.480 --> 13:08.380]  you know, you had to take into account. This graphic is all of the histograms for each of
[13:08.380 --> 13:17.560]  those organizations, not sorted by what's popular for them, but basically stacking on top of each
[13:17.560 --> 13:25.360]  other, so that we can see clear patterns occur. This very first topology on the list, it was
[13:25.360 --> 13:32.180]  popular in all five of the organizations, or all eight that are included in this data set.
[13:32.180 --> 13:36.180]  Same thing, you know, the number three one was pretty popular. The number five one was hugely
[13:36.180 --> 13:43.660]  popular across everybody. The only organization that was substantially different from all the
[13:43.660 --> 13:49.340]  others is the one that was graphed in cyan or light blue here, where they're a huge false spike
[13:49.340 --> 13:54.780]  and nobody else is a huge spike in that area. This is a big enough anomaly we dug in to figure out
[13:54.780 --> 14:02.460]  why, and it's because that particular organization had a default password that was so commonly used
[14:02.460 --> 14:07.840]  and still used by all their users, by a huge percentage of their users, that it threw off
[14:07.840 --> 14:09.760]  all the rest of the numbers.
[14:12.040 --> 14:17.160]  And it was a specific outlier just for them. I don't even remember what it was, but imagine it
[14:17.160 --> 14:23.040]  was, you know, one change me, and it was a number one, and then all lowercase. And even when people
[14:23.040 --> 14:29.440]  changed their password away from one change me, they made it two change me, or three change it.
[14:29.440 --> 14:33.600]  And so their topology was exactly the same as the original password was.
[14:34.300 --> 14:41.600]  Once you exclude that one data point, we see there's a huge commonality in which topologies
[14:41.600 --> 14:48.760]  are super common across industries. And these are companies in different sectors. They are,
[14:48.760 --> 14:54.760]  for the most part, US-centric. Some of them are global, but the majority of their user population
[14:54.760 --> 14:59.560]  was still English speakers. So that is one caveat. It's very possible that somebody with alternate
[14:59.560 --> 15:05.960]  language sets that users gravitate towards would look somewhat different. But if you knew
[15:05.960 --> 15:10.880]  nothing at all about a company other than that it had a substantial presence in the United States,
[15:10.880 --> 15:17.320]  take the top five common topologies on this list, attack those, and you're going to get tons of
[15:17.320 --> 15:25.060]  users. And as a defender, this is terrible, because even in the strength of what we think
[15:25.060 --> 15:29.800]  of as good password strength policies, we know our users are going to land this way,
[15:29.800 --> 15:32.460]  and attackers are going to come after them and hunt them down and kill them.
[15:33.920 --> 15:39.280]  So just to recap the kind of things that all this data crunching told us to confirm our suspicion
[15:39.280 --> 15:46.840]  about the problem and where we ought to try to find a way to improve the situation.
[15:49.710 --> 15:54.770]  Users pick the lowest competent denominator. There's specific things that they commonly do
[15:54.770 --> 16:01.090]  when told that the password policy is getting stronger. And those behaviors, although there
[16:01.090 --> 16:07.750]  are absolutely smart ways to attack an organization specific to that organization, go after word lists
[16:07.750 --> 16:13.190]  for their industry, go after the proper names of things in their hometown and sport, you know,
[16:13.190 --> 16:18.910]  sports teams and location names and blah, blah, blah. Still, these trends are going to be common
[16:20.230 --> 16:23.110]  no matter who and where the company is.
[16:25.130 --> 16:30.670]  And complexity rules just don't help, not nearly as much as we think they do anyway.
[16:33.310 --> 16:40.110]  By the way, another way to think about this in the COVID-19 era is users are not social distancing
[16:40.110 --> 16:48.490]  their passwords. To graphically depict what having 12% of your user population
[16:49.110 --> 16:55.050]  landing in a single topology means, if you were to imagine the land area of the...
[17:43.140 --> 17:45.660]  All right, I should be back. Sorry about that.
[17:46.940 --> 17:53.420]  So, if we were to imagine the entire land area of the United States as being the possible
[17:54.080 --> 17:57.880]  password key space that users could pick from,
[17:58.460 --> 18:04.780]  13% of all users choose to live together in a land area smaller than Manhattan.
[18:05.220 --> 18:08.840]  And what's worse is when you tell users to change their password,
[18:08.840 --> 18:13.920]  they don't go far. They just move a couple doors down or maybe one block away.
[18:14.120 --> 18:18.140]  So, any attacker knows they don't have to go far to hunt them down.
[18:20.540 --> 18:28.780]  And another thing about this is it applies even to much stronger password policies out of the box.
[18:30.120 --> 18:35.580]  We discussed this with some of our friends that work in places where... and this was going back
[18:35.580 --> 18:41.640]  to 2010. They had rules like you have to use a 15-character password and it has to have minimum
[18:41.640 --> 18:47.240]  two of everything. And they were like, so how strong is that really? And we said, well,
[18:47.880 --> 18:54.200]  it's really strong if you're not a smart attacker. But if you were a smart attacker and used,
[18:54.200 --> 19:01.880]  among other things, mask-based attacks targeting the most common topologies, you would have
[19:04.200 --> 19:08.460]  painfully... you would have a surprisingly good success rate.
[19:11.890 --> 19:17.470]  If you were looking at a large organization with lots of users and password histories,
[19:18.450 --> 19:23.650]  you would still likely be able... and you picked the topology that had the most users in it,
[19:23.650 --> 19:27.930]  you would likely still succeed in cracking a password roughly once every 12 hours
[19:30.730 --> 19:37.390]  for however many millions of years. But you don't need more than a few to get you started.
[19:40.850 --> 19:48.870]  All right. So, that's kind of the foundational concept that led to what I'm going to talk about
[19:48.870 --> 19:56.930]  next. So, it became clear we need new defenses. We need new ways to make things harder.
[19:59.570 --> 20:03.210]  The rules that we have now aren't terrible, but they're not enough.
[20:04.110 --> 20:08.030]  And one of the worst things we need... one of the strongest things we need to do is figure out how
[20:08.030 --> 20:12.970]  to keep users from all congregating in the same place. No matter what rules we impose on them,
[20:12.970 --> 20:16.810]  human nature says the majority of those users are going to cluster in a few
[20:18.630 --> 20:26.770]  new places, which can still be predicted or discovered by the attacker. So, at a high level,
[20:26.770 --> 20:33.010]  what are some things we could do? We could blacklist topologies that we know are a problem,
[20:33.930 --> 20:39.890]  where we know or predict users are going to land. And not just blacklist individual passwords,
[20:39.890 --> 20:46.590]  but blacklist the topologies, the shapes of those passwords. Nobody is ever allowed, again,
[20:46.590 --> 20:49.850]  to have a password that's an uppercase letter followed by a bunch of lowercase,
[20:49.850 --> 20:55.730]  followed by a couple of numbers, followed by a special. Just can't happen. Another thing we can
[20:55.730 --> 21:02.490]  do is require a minimum change distance between where your current password is and where the new
[21:02.490 --> 21:07.970]  one is that you want to go to. So, whatever words you're using, if you're using a three-letter word
[21:07.970 --> 21:11.810]  and a four-letter word with the first letters capitalized, the second letters replaced with
[21:11.810 --> 21:17.790]  numbers, and a special character in between, you can't use that exact same pattern on your
[21:17.790 --> 21:27.270]  next password change. And also, don't allow your users to congregate on whatever topology they
[21:27.270 --> 21:33.690]  choose to congregate on. This is a term that we didn't know when we were thinking about this
[21:33.690 --> 21:39.990]  early, but which is becoming one of the ways to describe this part, is dynamic password strength
[21:39.990 --> 21:46.590]  enforcement, meaning the password strength or the password policies you enforce are adaptive
[21:46.590 --> 21:55.130]  based on what your user population is doing. There's two big costs to this, which are key
[21:55.130 --> 22:04.720]  space reduction and user rebellion. I'll talk more about those in a bit. So first, what do I mean by
[22:04.720 --> 22:12.100]  blacklisting? It's not that complicated. We figure out what the most popular topologies are, usually
[22:12.700 --> 22:20.200]  for a given policy. Like, if your policy is 10 characters or more, it doesn't mean, it's not
[22:20.200 --> 22:25.680]  interesting to figure out what the most popular eight and nine character topologies are.
[22:25.900 --> 22:33.100]  But, and similarly, if you require four of four, then we don't care about blacklisting topologies
[22:33.100 --> 22:41.640]  that are three of four or two of four. By the way, whatever list you come up with as
[22:41.640 --> 22:47.300]  the best ones to blacklist, are also the best ones to start with when you're a cracker, you know, a
[22:47.300 --> 22:54.520]  patchwork cracker, and nobody's enforcing pathwell yet. So we published what was our working set at
[22:54.520 --> 22:59.520]  the time, back in 2014. We could, you know, we and a bunch of others can probably contribute
[23:00.240 --> 23:06.800]  updated data by now. But we also, with just about everything we figure out and publish,
[23:06.800 --> 23:12.100]  we figure out, we assume bad actors have already figured these things out before we started.
[23:12.220 --> 23:17.200]  That's one of the reasons that CoreLogic started the password, the Crack PC Can contest back in
[23:17.200 --> 23:25.720]  2010, because we figured bad actors were already figuring out and sharing, you know, password
[23:25.720 --> 23:30.040]  cracking tricks. And we wanted to bring more of that discussion out into the open for defenders
[23:30.040 --> 23:39.470]  and researchers to be aware of too. So what is the effectiveness of blacklisting?
[23:40.050 --> 23:46.010]  Well, if the attackers have figured, follow these exact techniques, and they go after,
[23:46.010 --> 23:50.910]  they use a top-end approach of targeting the topologies that are most likely to be successful,
[23:50.910 --> 23:55.590]  then when we take those top ones away, they're going to start, they're going to get zero cracks
[23:55.590 --> 24:00.810]  in their early, in their initial pass. Instead of getting 25% of the organization in minutes,
[24:00.810 --> 24:07.770]  they're going to get zero in the first hours and wonder what's up. Of course, once attackers figure
[24:07.770 --> 24:14.590]  out what we're doing, and once they're able to figure out what users do next, then maybe we're
[24:14.590 --> 24:21.010]  just pushing the problem around. You know, we blacklist these 100 most popular topologies,
[24:21.010 --> 24:26.190]  and now users are going to congregate in 100 different ones, and it's just chicken and the egg,
[24:26.190 --> 24:31.270]  or not chicken and the egg, but it's an arms race. We just, you know, go back and forth, maybe.
[24:34.170 --> 24:39.710]  The next thing we can do, which is starting to get more adaptive, is a minimum topology
[24:39.710 --> 24:48.350]  change at password change time. So without any kind of enforcement of topology change,
[24:48.530 --> 24:52.550]  a user who just increments a number in their password is a perfectly accepted new password.
[24:55.410 --> 25:00.050]  And similarly, even once we're doing that, even if we enforce the thing that says, hey,
[25:00.050 --> 25:05.950]  you can't use the same topology that you did in your previous password, they're probably going
[25:05.950 --> 25:14.170]  to make the smallest change that constitutes a topology change, such as pick a random letter
[25:14.170 --> 25:19.690]  that was uppercase and flip it to lower, or pick a lower and flip it to upper. That would pass
[25:19.690 --> 25:26.830]  the test. But it would still be lower down on an attacker's list than incrementing a number,
[25:26.830 --> 25:33.730]  or possibly incrementing a letter. And more importantly, if we say,
[25:33.730 --> 25:40.250]  if we say, well, let's figure out how can we measure the difference, the distance between
[25:40.250 --> 25:48.010]  one password and another, and then what's a minimum distance that we want to enforce?
[25:51.300 --> 25:59.000]  So, one of our people who is, well, way more educated than I am in the computer science realm
[25:59.000 --> 26:06.300]  said, oh, that's Levenstein distance. And I said, what's that? It was what I wanted without knowing
[26:06.300 --> 26:13.480]  it. It's a science, it's a way to measure the distance between two strings. How many edits,
[26:13.480 --> 26:28.460]  how many inserts, or removals, or increments, or whatever did, oops, yep. So,
[26:31.580 --> 26:38.940]  so for a strip, just for string changes, you measure the Levenstein distance between two
[26:38.940 --> 26:44.140]  strings by looking at how many characters were changed, added, removed.
[26:45.580 --> 26:51.660]  For topologies, we can do much the same thing. But when we, so when we look at
[26:53.000 --> 26:58.280]  these couple of hypothetical password changes, if you just, if you keep all of the character
[26:58.280 --> 27:04.380]  types the same, and you just increment one of them, or decrement, your topology's Levenstein
[27:04.380 --> 27:10.920]  distance change is zero. But if you change the character class of one of the positions,
[27:10.920 --> 27:17.600]  then you've successfully, you know, moved your topology from one to another.
[27:19.340 --> 27:26.380]  And obviously, if you combine more than one thing at a time, you may or may not,
[27:26.380 --> 27:31.040]  you know, your edit distance of the string and your Levenstein distance of your topology
[27:31.040 --> 27:36.760]  may be different. And if you go about, you know, making multiple changes at once,
[27:36.760 --> 27:41.200]  then it's going to be, it's not going to be the very next thing the attacker tries.
[27:41.200 --> 27:45.940]  When they learn that your password used to be password 20 exclamation point,
[27:45.940 --> 27:53.020]  passwords 20 exclamation point won't be the first thing they try. It might be the 10th.
[27:53.020 --> 27:58.180]  The, you know, more interesting if you change more positions throughout the string.
[28:00.260 --> 28:06.040]  And everything, every time you make a topology change, you multiply out the number of different
[28:06.040 --> 28:13.080]  paths the attacker has to go down in order to try to find you. Okay, so this still isn't really
[28:13.080 --> 28:19.800]  where leveling. Where leveling is, you know, comes from, at least I borrowed the term from
[28:19.800 --> 28:28.380]  the way things like solid state disks, SSDs, try to spread their writes out across all of the
[28:28.380 --> 28:32.180]  different memory cells before they come back to one that was previously written to and write to
[28:32.180 --> 28:38.420]  it again. So in this case, we want to take all these users who've bunched up in specific buckets
[28:38.420 --> 28:46.160]  of a specific topology and spread them out. The most ideally where leveled population of user
[28:46.160 --> 28:53.740]  passwords would have on average, you know, one or less users per bucket, or at least
[28:54.460 --> 28:57.940]  no more users in any particular bucket than any other.
[28:59.900 --> 29:09.360]  That way, if an attacker tries to pick any random topology there, or rather, any topology that they
[29:09.360 --> 29:14.560]  know humans are likely to gravitate towards, they'll get few, if any, cracks instead of,
[29:14.560 --> 29:24.200]  oh, 5% of your users landed on that topology. So we have to think about what are the impacts
[29:24.200 --> 29:30.500]  to an attacker of using where leveling effectively and uniformly. Now,
[29:30.500 --> 29:34.360]  there are downsides to doing it too, and I'll get to those in a second. But
[29:36.580 --> 29:43.880]  if the attacker is used to being able to target that first or the first top five topologies and
[29:43.880 --> 29:51.480]  get, you know, 10% of users per topology, 50% of your entire population in just, you know,
[29:51.480 --> 30:00.160]  10 topologies or less, if users were spread completely out, then you would end up with
[30:00.160 --> 30:04.980]  something like six orders of magnitude more work to get the same number of cracks, or
[30:05.820 --> 30:13.540]  six orders of magnitude fewer cracks using the same amount of guessing time. Now, in reality,
[30:13.540 --> 30:20.420]  it's not going to be that good. And it could be even worse. Suppose the attacker knew somehow
[30:21.120 --> 30:27.120]  exactly which topologies were in use by within your organization, then they could target just
[30:27.120 --> 30:33.400]  those topologies and ignore all the ones that aren't in use. However, because we know that
[30:33.400 --> 30:38.340]  their success rate in any given topology is now minimized, it's controlled, there's not going to
[30:38.340 --> 30:44.720]  be more than one or more than the number of topologies divided by your number of users
[30:45.460 --> 30:53.440]  in that particular topology, their success rate still drops by two to three orders of magnitude.
[30:53.740 --> 30:58.100]  In the realistic case, where users spread out, but they're not, it's not like you're originally
[30:58.100 --> 31:01.640]  assigning them a topology, you may as well assign them a password if that was the case,
[31:01.640 --> 31:07.700]  and that's going to be terrible, and everyone will kill you. But in the realistic case,
[31:07.700 --> 31:12.900]  you're probably looking at a four to five orders of magnitude change in work factor.
[31:13.540 --> 31:19.960]  And again, by that, I mean, surprisingly to us, it works out more or less reflexively. If you say,
[31:20.480 --> 31:24.740]  I used to spend this many hours and crack this many thousand passwords,
[31:24.740 --> 31:30.140]  I want to spend that same number of hours and see how many passwords I crack, you will crack
[31:31.020 --> 31:36.400]  one ten thousandth or so of the passwords you used to do in the same amount of time.
[31:36.620 --> 31:42.660]  And at the same, by the same token, if you say, I used to crack this many thousands of passwords,
[31:42.660 --> 31:47.220]  I want to crack that same percentage or that same number, no matter how long it takes,
[31:47.220 --> 31:52.120]  it's going to take you 10,000 to 100,000 times as long as it used to.
[31:54.860 --> 32:03.000]  So what are some costs here? The one that first comes to sort of mathematical minds is,
[32:03.000 --> 32:08.240]  I'm blacklisting parts of the keyspace. That means I'm making it so that there's fewer possible
[32:08.240 --> 32:12.900]  passwords users could choose. Oh my God, we're reducing our keyspace. That helps the attacker
[32:12.900 --> 32:18.460]  because now they have less ground to cover. But once you do the math, that actually
[32:19.600 --> 32:27.180]  almost disappears into nothing. So for instance, if you are talking about eight character password
[32:27.180 --> 32:34.840]  length, and in reality, there's almost no world and no hash in which eight character passwords
[32:34.840 --> 32:39.220]  are okay anymore, but keep the math simple. Start with eight characters. We're going to have
[32:39.220 --> 32:49.060]  48 possible different topologies, 64K, 65,000. If we blacklist the 100 most popular topologies,
[32:49.720 --> 32:57.820]  we still, that's less than, that's only around 0.2% of the possible keyspace. And in fact,
[32:57.820 --> 33:02.160]  if you are talking about nine character, 10 character, 11 character passwords,
[33:02.160 --> 33:09.800]  the percentage of your keyspace that you're deleting by blacklisting is even smaller
[33:09.800 --> 33:16.120]  if you're just talking about the top 100 out of a million instead of out of 65,000.
[33:17.140 --> 33:24.800]  Now, the thing about forcing unique topology use, or at least spreading uniformly in among
[33:24.800 --> 33:31.560]  the topologies that are not blacklisted, you actually do have a problem where any given
[33:31.560 --> 33:38.000]  randomly selected topology is now more likely to have a password in it than before you impose
[33:38.000 --> 33:48.260]  this rule. But if you think about the fact that what we're getting rid of by doing that is we're
[33:48.260 --> 33:54.480]  getting rid of the property where an attacker can find one topology that has five to 10% of
[33:54.480 --> 34:00.400]  your users, even if it's only that they can find a topology that has 1% of your users,
[34:00.400 --> 34:09.240]  then the difference is still massively in our favor that this is an improvement for the defender.
[34:10.980 --> 34:19.020]  Now, there's another possible downside. As Perth said, you're going to have, you know,
[34:19.020 --> 34:25.440]  violence against your security staff. Well, maybe. On the one hand, any new control that
[34:25.440 --> 34:32.460]  adds work for users is going to be resisted one way or the other. One of the ways, one of the
[34:32.460 --> 34:38.680]  things that you could do about this is measure it, like do pilot testing. Carnegie Mellon actually
[34:38.680 --> 34:47.140]  did a usability study on the difficulty of different stronger password strength enforcement,
[34:47.140 --> 34:52.660]  and their results were, yes, it matters, but it's not as bad as you might think. That's the TLDR of
[34:52.660 --> 35:00.480]  it. And that there are specific ways you can address it with user hinting. Now, what are ways
[35:00.480 --> 35:04.960]  you could hint to user, hey, your password isn't strong enough. If you adjust it this way, it would
[35:04.960 --> 35:11.640]  become strong enough. Or literally, your password is too weak, make it this instead. Those are
[35:11.640 --> 35:16.480]  decisions that, you know, we can work our way through and different organizations can choose
[35:16.480 --> 35:25.200]  differently. But it appears from CMU's research that some flavor of that can drastically
[35:26.240 --> 35:30.060]  solve the user complaints. You're going to have cranky users,
[35:30.060 --> 35:38.540]  but you're not going to have 100% of your users get cranky. And, yeah. Now,
[35:39.180 --> 35:43.420]  we always have to remember that if we're going to give extra information to users at password
[35:43.420 --> 35:51.160]  change time, that might somehow become user hints to the attacker.
[35:52.740 --> 35:57.140]  Some of those things we've thought through and figured out ways to address.
[35:57.140 --> 36:00.820]  Others are just going to be tradeoffs we're going to have to know that we're making.
[36:03.820 --> 36:09.300]  And keep it in mind when an organization chooses, for instance, what kind of hint level to use,
[36:09.300 --> 36:13.440]  they're going to have to keep that in mind. All right. So, enough of that. The code.
[36:14.180 --> 36:20.220]  So, a few years back now, we released a PAM library, a PAM module that implements all this
[36:20.220 --> 36:25.400]  stuff with optional controls to enable it and set the different parameters.
[36:28.760 --> 36:38.860]  We developed and tested it on multiple Linuxes, Gen 2, Ubuntu, probably some of the Red Hats,
[36:38.860 --> 36:42.320]  and I believe Solaris, but don't quote me on that.
[36:43.480 --> 36:46.960]  And it basically implements all the things that I just talked about, and I'll
[36:46.960 --> 36:50.940]  talk a little more about specifically how in a second.
[36:52.020 --> 36:57.680]  We did patent the ideas here, but the code is AGPL. You can use it for free all you want.
[36:57.680 --> 37:03.820]  The TLDR of the way the AGPL works is you can use this all you want for free stuff and open
[37:03.820 --> 37:08.740]  source stuff. If you want to make money off of it or make custom modifications that you make money
[37:08.740 --> 37:14.780]  off, then you have to talk to us because the AGPL license won't work. You'll have to ask us for a
[37:14.780 --> 37:21.920]  dual license on something or other, but that's fine. We did it as a PAM library or PAM module
[37:21.920 --> 37:28.700]  for Linux deployment, but with the thought that the API and the library could be used by other
[37:28.700 --> 37:34.000]  things too, like an LDAP server that implements single sign-on or what have you.
[37:36.300 --> 37:46.180]  Yes, so Perth is thinking about how attackers would adapt. I actually have that in here in a little bit too.
[37:47.600 --> 37:52.300]  Okay, so what are the different modes that we implemented in the PathWell open source code?
[37:52.300 --> 38:03.100]  So audit mode is simply the tracking of topologies that are used as users change their passwords.
[38:03.100 --> 38:07.140]  When you start in an organization, you don't know other than by cracking their existing password
[38:07.140 --> 38:12.100]  base, which is an option, especially for us, but it might not be an option for everybody. So you
[38:12.100 --> 38:19.640]  could enable this to record topology use as it goes. Now, if you're going to keep a database
[38:19.640 --> 38:23.760]  of all your users' topologies, you better consider that database pretty sensitive
[38:24.400 --> 38:30.280]  and be careful with it. And because of implementation details in our proof of concept,
[38:30.840 --> 38:35.780]  it's currently limited to tracking the topologies of characters up to 29
[38:35.780 --> 38:40.040]  characters long, which, you know, ought to be enough for anybody.
[38:42.200 --> 38:49.720]  Then there's enforcement modes, which actually impose different controls. The first and most
[38:49.720 --> 38:53.320]  obvious one I'm going to talk about is blacklisting. And it's basically everything I said before that
[38:53.320 --> 39:00.220]  blacklisting should do, it does. We distribute a standard list, a starter sort of list of
[39:00.560 --> 39:07.740]  topologies to blacklist, but you could also add or modify your own. And again, I'll reiterate,
[39:07.740 --> 39:15.720]  blacklisting itself is not enough as a permanent solution, but it does,
[39:17.720 --> 39:24.280]  you know, help you be better and more resistant than other organizations that somebody might attack.
[39:25.120 --> 39:29.760]  So, you know, you don't have to run faster than the bear,
[39:29.760 --> 39:33.720]  you just have to run faster than the other, the other guy.
[39:34.360 --> 39:40.580]  The next enforcement option is minlev, that's Levenstein distance, the minimum edit or the
[39:40.580 --> 39:45.920]  minimum distance for the topology of the new password candidate versus what the previous one
[39:45.920 --> 39:53.340]  was. This enforcement, actually, I should mention, neither blacklist mode nor minlev mode require
[39:53.340 --> 40:00.240]  auditing to be turned on. So if you're, if you have heartburn about the idea of maintaining a
[40:00.240 --> 40:06.060]  database of in-use topologies, and I don't blame you, then these features don't need that at all,
[40:06.060 --> 40:12.240]  because they only care about what your new password request is in the case of blacklist,
[40:12.240 --> 40:16.440]  or what your new password is compared to your old password in the case of minlev.
[40:16.440 --> 40:21.340]  They don't care about what your new password is compared to the entire rest of your organization,
[40:22.240 --> 40:28.880]  which is what max-use is about. Max-use, it allows you to say, I want to have a threshold,
[40:28.880 --> 40:33.040]  any given topology bucket can only have one user, or can only have five users,
[40:33.040 --> 40:36.820]  or whatever's the right number for the size of your organization.
[40:39.920 --> 40:45.280]  And again, the point here is, you know, if the attacker, if you have successfully kept users
[40:45.280 --> 40:53.240]  from clumping anywhere, or rather, if you want to prevent users from clumping anywhere, you use
[40:53.240 --> 40:59.560]  this, and then a user setting their password can't set their password to be the same topology that
[40:59.560 --> 41:06.280]  they or anybody else in your organization has used, or if not more than two have used, or whatever
[41:06.280 --> 41:15.920]  your setting is. Now, it's called an enforcement, well, it's an option that matters in enforcement
[41:15.920 --> 41:23.240]  mode, but it's not an enforcement, but rather a sort of, you know, that controls the user experience
[41:24.680 --> 41:32.460]  if you want to give any hints. So, we did an experimental hint info level implementation,
[41:32.460 --> 41:38.260]  partly to facilitate that kind of research that I talked about before, the user acceptance of
[41:38.820 --> 41:42.960]  what's the user experience when I tell them this much information, as opposed to when I tell them
[41:42.960 --> 41:53.060]  that. So, out of the box, PathWell doesn't give users any particular feedback or information,
[41:53.060 --> 41:57.160]  doesn't tell them how they could make their password stronger, doesn't leak information
[41:57.160 --> 42:02.360]  about the organization, doesn't give information to the screen that would be useful to somebody
[42:02.360 --> 42:08.300]  shoulder surfing. But if you choose to, if it's right for your organization, you can turn up
[42:08.300 --> 42:14.740]  hint levels. And this is supported by the backend API as well, not just by the PAM library. So,
[42:14.740 --> 42:22.640]  if you wanted to plug this stuff into a non-PAM using single sign-on server, you know, go for it.
[42:23.280 --> 42:30.320]  Right now, hints are only hooked up for blacklist violations, but the other types
[42:31.120 --> 42:37.360]  should be doable without a problem too. Now, I'm not going to do a live demo, but I'm going to show
[42:37.360 --> 42:47.160]  some examples of how this works in practice on a Linux box. So, you install the PAM library
[42:47.900 --> 42:53.980]  and you modify your PAM.d settings. And we have readmes and examples for the different distributions
[42:53.980 --> 43:01.420]  that we supported and tested it on, but you can also roll your own custom, you know, PAM settings.
[43:01.960 --> 43:08.780]  So, you do a thing to turn on audit mode, you do a thing to turn on blacklist or minlev or maxuse
[43:08.780 --> 43:15.680]  or any combination thereof, and you can also enable the hint level. So again,
[43:17.780 --> 43:23.580]  maxuse requires that audit be turned on, but the other enforcement modes don't require audit to be
[43:23.580 --> 43:32.200]  enabled. So, what's it look like when things happen? Well, because this is sort of, you know,
[43:32.200 --> 43:38.420]  because this is a beta, we're verbose, although we don't include secrets in the output, but we're
[43:38.420 --> 43:44.680]  verbose even on successful, even in the case of success, we log a bunch of info about the fact
[43:44.680 --> 43:53.480]  that we just accepted a password change. A failure will tell the user that it failed a specific
[43:53.480 --> 43:57.480]  check, the minlev check in this case, but it won't tell them anything else. It won't tell them
[43:57.480 --> 44:03.300]  use this one instead. It won't say there's already one user and minlev is set to, or sorry,
[44:03.300 --> 44:12.100]  it won't tell them that, you know, how to modify their candidate. It'll just tell them, you know,
[44:12.100 --> 44:22.800]  they failed and we log that. And then a maxuse failure, where in this case minlev was not being,
[44:22.800 --> 44:28.920]  was not turned on, so users were allowed to choose any new password they want, any new topology
[44:28.920 --> 44:34.000]  they wanted, as long as it wasn't in use by somebody. Well, it was in use by them, therefore
[44:34.560 --> 44:38.920]  they can't reuse the same one, but they also can't reuse the same topology as their neighbor.
[44:40.140 --> 44:47.160]  And we log that. But again, we don't give anything else about, you know, give away anything else.
[44:48.740 --> 44:58.970]  Now, showing a little bit of what some of the hint level stuff. You can, if you want,
[44:59.750 --> 45:05.450]  look at the topology of the password that they supplied and say, that's a topology that we're
[45:05.450 --> 45:13.390]  not going to allow. Let's figure out what a change, what change would be viable. And the
[45:13.390 --> 45:19.950]  what change would be viable is basically randomly selected at password change time.
[45:19.970 --> 45:24.950]  By the engine, it says, okay, this topology they asked for is not allowed.
[45:25.090 --> 45:29.390]  What are the neighbors of this topology that are allowed? I'm going to pick one of those at random
[45:29.390 --> 45:35.670]  and then suggest an edit that would land the user in the new allowed topology. Now, funny enough,
[45:35.670 --> 45:41.970]  what it's suggesting here is actually a topology that wouldn't be great on its own. But for
[45:41.970 --> 45:48.790]  instance, maybe the organization's minimum password length is such that, you know, by making this
[45:48.790 --> 45:54.410]  longer, they'll end up being an outlier that way, because most users won't make their password any
[45:54.410 --> 46:05.550]  longer than they need to. You jack up the hint level one more. And now we'll actually show them
[46:06.130 --> 46:13.870]  where, you know, drawing on their existing password here, you know, point at their,
[46:13.870 --> 46:17.730]  excuse me, point at the spot in their existing password and say, put a, put a this here,
[46:17.730 --> 46:31.140]  put a that there. And then even more, we can suggest specific things. You can say, you know,
[46:32.620 --> 46:37.220]  here, try replacing this with a that and try inserting a this here.
[46:37.940 --> 46:43.860]  And your password is going to end up as this. You know, classically and back in the day,
[46:43.860 --> 46:52.660]  nobody ever suggests echoing plain text back to the human. But it turns, you know, there are
[46:52.660 --> 46:57.840]  actually use cases and scenarios where that's reasonable, or at least where it's reasonable to
[46:57.840 --> 47:02.900]  let the user choose. That's why a lot of, you know, web apps these days have a little button
[47:02.900 --> 47:09.300]  you can click or hold to show, to reveal rather than always make it opaque. And it's always just
[47:09.420 --> 47:13.840]  a trade-off. So if it's right for your organization, it's right for your organization. I'm not going
[47:13.840 --> 47:19.780]  to judge you until I come and pen test it and own you, and then I'll judge you. So what's next for
[47:19.780 --> 47:25.380]  the Pathwell project? Well, like I said, hints are only implemented for one of the modes. So
[47:25.380 --> 47:31.340]  easily, it's easy to say we should just go ahead and add hint support for the other modes.
[47:32.820 --> 47:41.560]  More platforms. The working implementation, like I said, is Linux, PAM. But the bigger
[47:41.560 --> 47:46.300]  organizations, places where this is going to be more useful, are going to be running either AD
[47:46.300 --> 47:53.200]  or some other kind of large single sign-on platform. If a vendor of one of those platforms
[47:53.200 --> 47:58.580]  wants to work on making their product better than all other products in the universe, come talk to
[47:58.580 --> 48:06.160]  us. The other thing, too, is we can easily come up with more ways we could improve things, more
[48:06.160 --> 48:19.560]  enforcement options. So first of all, this is highly focused on one specific, very successful,
[48:19.560 --> 48:25.360]  but it's still only one specific aspect of password cracking, which is the mask attack. So
[48:25.360 --> 48:31.420]  are there specific things we could learn from other highly successful password crack
[48:32.400 --> 48:38.160]  methodologies which we could, in turn, turn into defensive enforcement, dynamic strength
[48:38.160 --> 48:43.400]  enforcement at password change time? Say, well, I know that that thing you just tasked for is going
[48:43.400 --> 48:48.740]  to fall victim to this specific attack. We can compensate for that attack pretty easily and not
[48:48.740 --> 48:57.400]  allow you to make that requested password change. Now, an easier, more straightforward
[48:57.400 --> 49:03.100]  thing would be, hey, we could easily do regular expression support. Right now, our blacklists
[49:03.100 --> 49:08.760]  are basically just the masks tokenized into a machine format. But basically, right now,
[49:08.760 --> 49:17.240]  we just have a list of masks. We could enforce RegExes, too. So you can make a RegEx that
[49:17.240 --> 49:22.260]  would disallow a huge variety of different ways to say the word DefCon in your password
[49:22.920 --> 49:29.500]  all at once. And do that with, you know, do that with whatever your company name is. Exactly the
[49:29.500 --> 49:35.700]  word list that you find is the top most common word in your organization's passwords, because
[49:35.700 --> 49:40.820]  you do regular password audits, because you're a smart customer of ours. You could say, okay, well,
[49:40.820 --> 49:47.380]  my company name and my city name and my sports team name, all of those and every variation that
[49:47.380 --> 49:52.260]  fits into, you know, would match these RegExes, I'm going to disallow. And now, you've taken away
[49:52.260 --> 49:57.940]  not one blacklisted password or 10 blacklisted passwords, but millions of terrible passwords
[49:57.940 --> 50:04.420]  all at once. And then, if this did get adopted, what would attackers do next?
[50:04.420 --> 50:13.300]  And what would we need to do to adapt to that? So that's pretty much it. I'll take questions
[50:13.300 --> 50:16.740]  and I'll first, I'll scroll back to look for questions,
[50:16.740 --> 50:21.380]  but then I'll take more and I'll be around and in Discord the rest of the event.
